The phylotype-based approach yielded 197 OTUs at genus level while 11,257 OTU clusters were generated by the OTU-based approach at 97% identity. The phylogeny approach generated 58,929 tree nodes which were taxonomically classified at 97% identity. Manual review and annotation should lead to actual non-redundant results.
Figure 7: Unfiltered and curated OTU abundance. Visual representation of taxon terms highlight the most abundant taxon based on frequency of being assigned to an OTU or tree nodes. Muribaculaceae is the most frequently assigned family and Muribaculaceae_ge is the most frequent species assigned to most sequences.
OTUs
[1] "Otu001" "Otu002" "Otu003" "Otu004" "Otu005" "Otu006" "Otu007"
[8] "Otu008" "Otu010" "Otu011" "Otu012" "Otu013" "Otu014" "Otu016"
[15] "Otu017" "Otu019" "Otu020" "Otu021" "Otu022" "Otu023" "Otu024"
[22] "Otu026" "Otu028" "Otu029" "Otu031" "Otu032" "Otu037" "Otu038"
[29] "Otu041" "Otu045"
Phylum
[1] "Bacteroidetes" "Firmicutes"
Classn
[1] "Bacilli" "Bacteroidia" "Clostridia"
[4] "Erysipelotrichia"
Order
[1] "Anaeroplasmatales" "Bacteroidales" "Bifidobacteriales"
[4] "Clostridiales" "Erysipelotrichales" "Lactobacillales"
[7] "Mollicutes_RF39" "Verrucomicrobiales"
Family
[1] "Akkermansiaceae" "Bacteroidaceae"
[3] "Bifidobacteriaceae" "Clostridiaceae_1"
[5] "Clostridiales_vadinBB60_group" "Erysipelotrichaceae"
[7] "Lachnospiraceae" "Lactobacillaceae"
[9] "Mollicutes_RF39_fa" "Muribaculaceae"
[11] "Peptococcaceae" "Rikenellaceae"
[13] "Ruminococcaceae"
Genus
[1] "Acetatifactor" "Akkermansia"
[3] "Alistipes" "Anaerotruncus"
[5] "Bacteroides" "Bifidobacterium"
[7] "Candidatus_Arthromitus" "Clostridiales_vadinBB60_group_ge"
[9] "Lachnoclostridium" "Lachnospiraceae_NK4A136_group"
[11] "Lachnospiraceae_UCG.001" "Lactobacillus"
[13] "Mollicutes_RF39_ge" "Muribaculaceae_ge"
[15] "Oscillibacter" "Roseburia"
[17] "Ruminiclostridium" "Ruminiclostridium_5"
[19] "Ruminiclostridium_9" "Ruminococcaceae_ge"
[21] "Turicibacter"
Figure x. Rank abundance of eight selected samples. Package: goeveg
Figure x. Correlation between species identified at phylum-level. Species are ordered alphabetically (top panel) and heuristically (bottom panel)
Figure x: Stacked barplots for species richness. The estimated richness (green bars) was calculated using chao calculator and observed ichness (red bars) was calculated using sobs.
Figure x: Species richness (observed species) displayed by boxplot (A), density plots (B) and histograms (C).
Figure x: Correlation between species richness and sequence depth. Observed species calculated using sobs (A) and estimated species richness by chao calculator (B).
Figure x: Species diversity and correlation to species richness. Definitely phylo-diversity (C) correlates well with the species richness.
Figure x: Species diversity estimates as a function of sample size. Only species with abundance greater or equal to 1 are detected in the sample.
Figure x. Rarefaction and extrapolation curves. Sample-size-based curve (A), sample completeness curve (B), Coverage‐based curves (C).
OTUbased
Number_clusters Value_Index
2.0000 22.3718
Phylum
Number_clusters Value_Index
2.0000 114.5654
Class
Number_clusters Value_Index
2.0000 73.9034
Order
Number_clusters Value_Index
2.00 42.17
Family
Number_clusters Value_Index
2.000 31.017
Genus
Number_clusters Value_Index
2.000 24.511
Figure x: Optimal number of OTU clusters. The suggested number of best clusters (dotted line) thta could expllain most variation is 2 for OTUs (A), 3 for phylum (B), 3 for class (C), 2 for Order (D), 10 for Family (E) and 2 for Genus (F). A high average silhouette width indicates high quality clustering.
cluster size ave.sil.width
1 1 232 0.67
2 2 128 0.25
cluster size ave.sil.width
1 1 240 0.68
2 2 120 0.24
cluster size ave.sil.width
1 1 239 0.67
2 2 121 0.22
cluster size ave.sil.width
1 1 239 0.67
2 2 121 0.22
cluster size ave.sil.width
1 1 229 0.67
2 2 131 0.19
cluster size ave.sil.width
1 1 241 0.68
2 2 119 0.29
Figure x: Silhouette plot guided by the best number of clusters. Observations with a large Si (almost 1) are very well clustered. A small Si (around 0) means that the observation lies between two clusters while a negative Si are probably placed in the wrong cluster.
Figure x: Scree plot of PCA. Shows which components explain most of the variability in the data. Over 80% of the variances contained in OTU and taxonomy data are retained by the first two principal components. The first PC explains the maximum amount of variation in the data set.
While PCA is based on Euclidean distances the PCoA is based on the (dis)similarity matrix calculated from OTU abundance data as described earlier. Literally, in any successful PCA or PCoA the first few axes are supposed to capture most of the variation in the input data. NMDS tries to substitute the original distance data with ranks. Unlike the PCA and PCoA the NMDS axes of the ordination are not ordered according to the variance they explain, instead a plot of stress values (a measure of goodness-of-fit) against dimensionality can be used to assess the proper choice of dimensions. Note that stress values >0.2 are generally considered hard to interpret, whereas values <0.1 are good and <0.05 are the better. In any case the inflexion point on scree plots and Shepard plots (stress plots) can be used to guide the selection of a minimum number of dimensions to use in the interpretation of the multidimensional data.
Figure x: Principal coordinate analysis ordination using Bray-Curtis dissimilarity matrix. Objects that are ordinated closer together have smaller dissimilarity values than those ordinated further apart. A successful PCoA will capture most of the variation in the (dis)similarity matrix in a few PCoA axes.
OTUs
----------------------------
Call:
metaMDS(comm = otu.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(otu.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.1182664
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(otu.t[, -1]))'
Phylum
----------------------------
Call:
metaMDS(comm = phylum.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(phylum.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.1201002
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(phylum.t[, -1]))'
Class
----------------------------
Call:
metaMDS(comm = class.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(class.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.1402302
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(class.t[, -1]))'
Order
----------------------------
Call:
metaMDS(comm = order.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(order.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.143402
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(order.t[, -1]))'
Family
----------------------------
Call:
metaMDS(comm = family.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(family.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.1343065
Stress type 1, weak ties
No convergent solutions - best solution after 20 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(family.t[, -1]))'
Genus
----------------------------
Call:
metaMDS(comm = genus.t[, -1], distance = "bray", k = 3, try = 10, display = c("sites"), choices = c(1, 2), type = "t", shrink = FALSE)
global Multidimensional Scaling using monoMDS
Data: wisconsin(sqrt(genus.t[, -1]))
Distance: bray
Dimensions: 3
Stress: 0.0001578126
Stress type 1, weak ties
Two convergent solutions found after 10 tries
Scaling: centring, PC rotation, halfchange scaling
Species: expanded scores based on 'wisconsin(sqrt(genus.t[, -1]))'
Figure X. Sherperd and non-metric multidimensional scaling plot. Green oints represent samples and red points represent OTU or species. Similar samples are ordinated together. Stress values are shown at the botthom of ordination plot.
Figure x: Sample Phylip or Newick-formatted tree clustered using the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) algorithm. Similar data was used to construct different types of tree including rectangular (A), circular (B) and unrooted (C) to view how samples were clustered.